29 research outputs found

    RIME: Repeat Identification

    Get PDF
    We present an algorithm for detecting long similar fragments occurring at least twice in a set of biological sequences. The problem becomes computationally challenging when the frequency of a repeat is allowed to increase and when a non-negligible number of insertions, deletions and substitutions are allowed. We introduce in this paper an algorithm, Rime1 1 Rime is also a reference to Coleridge's poem "The Rime of an Ancient Mariner" which contains many repetitions as a poetic device. (for Repeat Identification: long, Multiple, and with Edits) that performs this task, and manages instances whose size and combination of parameters cannot be handled by other currently existing methods. This is achieved by using a filter as a preprocessing step, and by then exploiting the information gathered by the filter in the following actual repeat inference step. To the best of our knowledge, Rime is the first algorithm that can accurately deal with very long repeats (up to a few thousands), occurring possibly several times, and with a rate of differences (substitutions and indels) allowed among copies of a same repeat of 10-15% or even more

    EXMOTIF: efficient structured motif extraction

    Get PDF
    BACKGROUND: Extracting motifs from sequences is a mainstay of bioinformatics. We look at the problem of mining structured motifs, which allow variable length gaps between simple motif components. We propose an efficient algorithm, called EXMOTIF, that given some sequence(s), and a structured motif template, extracts all frequent structured motifs that have quorum q. Potential applications of our method include the extraction of single/composite regulatory binding sites in DNA sequences. RESULTS: EXMOTIF is efficient in terms of both time and space and is shown empirically to outperform RISO, a state-of-the-art algorithm. It is also successful in finding potential single/composite transcription factor binding sites. CONCLUSION: EXMOTIF is a useful and efficient tool in discovering structured motifs, especially in DNA sequences. The algorithm is available as open-source at:

    A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The ability to monitor the change in expression patterns over time, and to observe the emergence of coherent temporal responses using gene expression time series, obtained from microarray experiments, is critical to advance our understanding of complex biological processes. In this context, biclustering algorithms have been recognized as an important tool for the discovery of local expression patterns, which are crucial to unravel potential regulatory mechanisms. Although most formulations of the biclustering problem are NP-hard, when working with time series expression data the interesting biclusters can be restricted to those with contiguous columns. This restriction leads to a tractable problem and enables the design of efficient biclustering algorithms able to identify all maximal contiguous column coherent biclusters.</p> <p>Methods</p> <p>In this work, we propose <it>e</it>-CCC-Biclustering, a biclustering algorithm that finds and reports all maximal contiguous column coherent biclusters with approximate expression patterns in time polynomial in the size of the time series gene expression matrix. This polynomial time complexity is achieved by manipulating a discretized version of the original matrix using efficient string processing techniques. We also propose extensions to deal with missing values, discover anticorrelated and scaled expression patterns, and different ways to compute the errors allowed in the expression patterns. We propose a scoring criterion combining the statistical significance of expression patterns with a similarity measure between overlapping biclusters.</p> <p>Results</p> <p>We present results in real data showing the effectiveness of <it>e</it>-CCC-Biclustering and its relevance in the discovery of regulatory modules describing the transcriptomic expression patterns occurring in <it>Saccharomyces cerevisiae </it>in response to heat stress. In particular, the results show the advantage of considering approximate patterns when compared to state of the art methods that require exact matching of gene expression time series.</p> <p>Discussion</p> <p>The identification of co-regulated genes, involved in specific biological processes, remains one of the main avenues open to researchers studying gene regulatory networks. The ability of the proposed methodology to efficiently identify sets of genes with similar expression patterns is shown to be instrumental in the discovery of relevant biological phenomena, leading to more convincing evidence of specific regulatory mechanisms.</p> <p>Availability</p> <p>A prototype implementation of the algorithm coded in Java together with the dataset and examples used in the paper is available in <url>http://kdbio.inesc-id.pt/software/e-ccc-biclustering</url>.</p

    GRISOTTO: A greedy approach to improve combinatorial algorithms for motif discovery with prior knowledge

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Position-specific priors (PSP) have been used with success to boost EM and Gibbs sampler-based motif discovery algorithms. PSP information has been computed from different sources, including orthologous conservation, DNA duplex stability, and nucleosome positioning. The use of prior information has not yet been used in the context of combinatorial algorithms. Moreover, priors have been used only independently, and the gain of combining priors from different sources has not yet been studied.</p> <p>Results</p> <p>We extend RISOTTO, a combinatorial algorithm for motif discovery, by post-processing its output with a greedy procedure that uses prior information. PSP's from different sources are combined into a scoring criterion that guides the greedy search procedure. The resulting method, called GRISOTTO, was evaluated over 156 yeast TF ChIP-chip sequence-sets commonly used to benchmark prior-based motif discovery algorithms. Results show that GRISOTTO is at least as accurate as other twelve state-of-the-art approaches for the same task, even without combining priors. Furthermore, by considering combined priors, GRISOTTO is considerably more accurate than the state-of-the-art approaches for the same task. We also show that PSP's improve GRISOTTO ability to retrieve motifs from mouse ChiP-seq data, indicating that the proposed algorithm can be applied to data from a different technology and for a higher eukaryote.</p> <p>Conclusions</p> <p>The conclusions of this work are twofold. First, post-processing the output of combinatorial algorithms by incorporating prior information leads to a very efficient and effective motif discovery method. Second, combining priors from different sources is even more beneficial than considering them separately.</p

    Crystal Structure of the Formin mDia1 in Autoinhibited Conformation

    Get PDF
    Formin proteins utilize a conserved formin homology 2 (FH2) domain to nucleate new actin filaments. In mammalian diaphanous-related formins (DRFs) the FH2 domain is inhibited through an unknown mechanism by intramolecular binding of the diaphanous autoinhibitory domain (DAD) and the diaphanous inhibitory domain (DID).Here we report the crystal structure of a complex between DID and FH2-DAD fragments of the mammalian DRF, mDia1 (mammalian diaphanous 1 also called Drf1 or p140mDia). The structure shows a tetrameric configuration (4 FH2 + 4 DID) in which the actin-binding sites on the FH2 domain are sterically occluded. However biochemical data suggest the full-length mDia1 is a dimer in solution (2 FH2 + 2 DID). Based on the crystal structure, we have generated possible dimer models and found that architectures of all of these models are incompatible with binding to actin filament but not to actin monomer. Furthermore, we show that the minimal functional monomeric unit in the FH2 domain, termed the bridge element, can be inhibited by isolated monomeric DID. NMR data on the bridge-DID system revealed that at least one of the two actin-binding sites on the bridge element is accessible to actin monomer in the inhibited state.Our findings suggest that autoinhibition in the native DRF dimer involves steric hindrance with the actin filament. Although the structure of a full-length DRF would be required for clarification of the presented models, our work here provides the first structural insights into the mechanism of the DRF autoinhibition

    Septation of Infectious Hyphae Is Critical for Appressoria Formation and Virulence in the Smut Fungus Ustilago Maydis

    Get PDF
    Differentiation of hyphae into specialized infection structures, known as appressoria, is a common feature of plant pathogenic fungi that penetrate the plant cuticle. Appressorium formation in U. maydis is triggered by environmental signals but the molecular mechanism of this hyphal differentiation is largely unknown. Infectious hyphae grow on the leaf surface by inserting regularly spaced retraction septa at the distal end of the tip cell leaving empty sections of collapsed hyphae behind. Here we show that formation of retraction septa is critical for appressorium formation and virulence in U. maydis. We demonstrate that the diaphanous-related formin Drf1 is necessary for actomyosin ring formation during septation of infectious hyphae. Drf1 acts as an effector of a Cdc42 GTPase signaling module, which also consists of the Cdc42-specific guanine nucleotide exchange factor Don1 and the Ste20-like kinase Don3. Deletion of drf1, don1 or don3 abolished formation of retraction septa resulting in reduced virulence. Appressorium formation in these mutants was not completely blocked but infection structures were found only at the tip of short filaments indicating that retraction septa are necessary for appressorium formation in extended infectious hyphae. In addition, appressoria of drf1 mutants penetrated the plant tissue less frequently

    Genome Sequence of the Pea Aphid Acyrthosiphon pisum

    Get PDF
    Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetabolous insect provides an outgroup to the multiple published genomes of holometabolous insects. Pea aphids are host-plant specialists, they can reproduce both sexually and asexually, and they have coevolved with an obligate bacterial symbiont. Here we highlight findings from whole genome analysis that may be related to these unusual biological features. These findings include discovery of extensive gene duplication in more than 2000 gene families as well as loss of evolutionarily conserved genes. Gene family expansions relative to other published genomes include genes involved in chromatin modification, miRNA synthesis, and sugar transport. Gene losses include genes central to the IMD immune pathway, selenoprotein utilization, purine salvage, and the entire urea cycle. The pea aphid genome reveals that only a limited number of genes have been acquired from bacteria; thus the reduced gene count of Buchnera does not reflect gene transfer to the host genome. The inventory of metabolic genes in the pea aphid genome suggests that there is extensive metabolite exchange between the aphid and Buchnera, including sharing of amino acid biosynthesis between the aphid and Buchnera. The pea aphid genome provides a foundation for post-genomic studies of fundamental biological questions and applied agricultural problems
    corecore